Skip to main content
Version: 7.9

Upgrade Best Practices

Use Case Overview

The purpose of this document is to guide users with completing a successful upgrade or update to a later version of Resolve Actions Pro. The sections below will detail the upgrade actions and highlighting the potential risks needed for planning the upgrade.

Before starting any upgrade, plan the steps & know your system and the risks

  • Know your current Actions Pro architecture
  • Know your proposed Actions Pro architecture if a change is required or requested
  • Know how to test and validate all integrations before all Runbooks
  • Know what you are planning to get out of the update / upgrade (take baseline measurements if required)
  • Know any load testing / system performance requirements
  • Verify backups as needed prior to upgrade procedure
  • Verify and Update Steps Execute Order (I.E. MOP)
  • Verify back out plans
  • Classify all Integrations and Runbooks by Priority
  • Verify your test plan
  • Verify Go/No-Go requirements
  • Prepare any test data and systems
  • Test all possible scenarios in UAT
  • Prepare to load test UAT and PROD
  • (If Elasticsearch updated) Backup and Restore or Archive
  • QA Based - Testing Requirement
  • Production Based - Testing Requirement

Understanding Version Differences

This section is designed to highlight the differences between versions of Actions Pro and allow for successful planning and preparation of the upgrade process.

note
  • Actions Pro automation developments (automations, WIKI”s, etc.) are not always backwards compatible between Actions Pro versions. Particular care should therefore be taken when upgrading any development environment not to deploy to a production environment running a previous version.
  • All patches / hotfixes are cumulative for newer release, and so a review of any intermediate version of Actions Pro should be undertaken to ensure you have a full understanding of changes implemented. Example: When upgrading from 6.1 to 6.4 an evaluation of patches/hotfixes in releases 6.2 & 6.3 needs should be undertaken and impact taken in to consideration. Always check with Resolve Support for the current minimum supported version for any specific major release (6.3.x, 6.4.x, etc.), to ensure you have the latest GA release. The version levels listed below are highlights of each release. Please refer to each version’s Release Notes.

Actions Pro 6.1

  • Elasticsearch Update (greater than 1 version) to version 5.x
    • Requires the complete unload and load of all data Elasticsearch data
    • Rolling updates are not available for 6.0 and 5.x to 6.1 or newer

Actions Pro 6.2

  • Addition of CSRF Guard (Cross-Site Request Forgery)
    • This will impact to all REST and Webservices integrations.
    • Potential impact to communications and network validation due to CSRF token as part of the URL 
  • For more details, please review the release notes

Actions Pro 6.3

  • Updated Groovy version from 2.4.4 to 2.4.15
  • Some Actions Pro API classes as been moved please review the following kb for more details
  • Update to visualization module, changed from EXTJS to React
  • Inclusion of the Security Incident Response (SIR) Dashboard and process set
  • For more details, please review Release Notes 6.3 and 6.3.1
  • For more details, please review the cumulative hotfix notes for 6.3.0.x and 6.3.1.x

Actions Pro 6.4

  • Improve Performance and Scalability
  • Improved Stability and Up Time
  • Updated 3rd party libraries versions
  • Removed Atomikos library and refactored HibernateUtil
  • Rolling updates are not available for 6.4.0.9+ due to changes in RSMQ settings. This will also require updating all Standalone RSRemotes at the same time. Updating from 6.4.0.9 to later version of Actions Pro will allow for rolling updates.
  • For more details, please review Release Notes 6.4.

Actions Pro 6.5

  • Namespace RBAC
    • Namespace is the preferred method of organizing content as defined by customers and based on their specific processes and organizational requirements. RBAC stands for Role-Based Access Rights
  • Version Control and Git Integration
    • This new feature enables acceleration and simplification of automation development and deployment with version label and integration with Git.
  • RSRemote, Gateways and Filters
    • Simplify management of RSRemote, Gateways, and Filters through a centralized administration console
  • Implemented the Log4j2 Framework in v6.5
    • This replaced log4j 
  • For more details, please review the Release Notes 6.5.

Actions Pro 7.0

  • Upgraded Infrastructure with inherent feature improvements, security and bug fixes
    • Migrated from Oracle JDK 8 to OpenJDK 13
    • Upgraded Elasticsearch from 5.4.0 to 7.6.0
    • Upgraded RabbitMQ from 3.6.2 to 3.8.3
    • Upgraded Tomcat from 8.5.23 to 9.0.30
    • Upgraded Erlang 18.3 to 22
    • Upgraded Logstash from 5.4.0 to 7.6.0
  • Packaged pre-built content for faster value delivery
    • Additional content is available to download and import into Actions Pro. Link to content platform can be found in the “About” pop-up dialog.
  • Improved Content Browser search behaviour
    • Using the Global Search functionality now redirects to the Content Browser instead of Search Results Page.
  • New UI theme for people with visual impairment – High Contrast Theme
    • Quality of Life improvement that meets accessibility requirements defined by WACG v.2.0
  • Added a System Property to change the default Task, Event, Pre-condition merge type Policy
  • New RabbitMQ features
    • Support for RabbitMQ vhost in ESB component
    • Support for optional specific username/password for external Actions Pro components (RSremotes)
  • Removed Features for v7.0 – Summary
    • Global Search Results page in ExtJS GUI.
    • Advanced Search dialog in ExtJS Search Results page
    • Searching through Global Search redirects to Content Browser which makes the following system properties no longer relevant. The following is removed from System Properties page
      • search.content
      • search.default.type
      • search.ui.results.collapse
    • Update Guide v.7.0 documentation has been removed as procedures are now combined under Upgrade Guide v.7.0
    • Document Development documentation is removed. Document development and page Builder is now unified under Page Builder document
    • “Business Rules” feature is removed.
  • For more details, please review the Release Notes 7.0

Actions Pro 7.1

  • New Analytics Dashboard Beta is added to the main menu
  • Rate Limiting
  • Support for user names longer than 40 characters
  • Enforce strong user password
  • Removed all plain text username and passwords from the URL
  • Runbooks marked as "hidden" are no longer listed in Content Browser.
  • Added new Resolve Content Namespace
  • Added auto-refresh user setting for Package Manager and Content Browser pages
  • Support of Namespace for Action Task Properties
  • Kibana service installation is included in Actions Pro 7.1
  • Integration with HashiCorp Vault
  • For more details, please review the Release Notes 7.1

Planning the Update

Backup

Prior to any changes to the Actions Pro environment, a backup/snapshot must be taken to ensure data integrity and will add a fallback measure. Please review the attached, “System Back-Up Strategy Guide Single Node & HA Clusters”.

Know your current Actions Pro architecture

As additional features and functionality are added to Actions Pro release, the potential for increased system resources exist. Before any upgrade always check the new Actions Pro version system requirements and compare with the existing system builds. Any system hardware upgrades should be completed prior to the Actions Pro software upgrade.

note

Actions Pro version 6.4 and above are highly scalable vertical and horizontally and so this should be considered as part of any re-architecting.

Know your proposed Actions Pro architecture if a change is required or requested

Any changes to system architecture should be undertaken prior to software upgrades, these include:

  • Existing Hardware upgrades (Memory, CPUS’s etc.)
  • Additional nodes adding to cluster (Horizontal scaling)
  • Actions Pro software component allocation (e.g. moving RSSearch, or RSView to separate servers)

Know how to test and validate all integrations before all Runbooks
In almost all cases Actions Pro will have multiple integration points. It is our belief that these should be tested separately. They need validation for both single or bidirectional communications and the ability to pass data and perform all functions as required.

Know what you are planning to get out of the update / upgrade

If this is a simple upgrade with no expansion of runbooks, functionality, or integrations, you still need to know what to test and what the expected outcome should be.

Verify backups as needed prior to upgrade procedure

This can never be understated. In some cases, the original server(s) are available as you transition to a different server set. In any case here are the areas you may need to be concerned with:

  • A complete backup of the RDBMS
  • A complete backup of the current Actions Pro installation directory
  • Instructions to rebuild the OS, denoting the Actions Pro user and system configuration and parameters if needed

In some cases, Actions Pro is running on a virtual machine and a simple snapshot to revert to will satisfy all the requirements. If this is the case, please verify that Actions Pro is ‘stopped’ at the time the snapshot was taken. If this is an HA Cluster with three VMs, the same rules apply and all 3 must be stopped at the time the snapshot is taken

In other cases that may not be available. And you will have to validate these exist as needed. See details on the version in its Release Notes.

Verify and Update Steps Execute Order (I.E. MOP)

While normally there are not a lot of steps to update Actions Pro, there are always concerns that some user-defined properties as well as configurations were overwritten during this process.

Having all these defined and methods to test and validate them are all required for a smooth and successful upgrade.

Verify back out plans

Based on the MOP above, when time has allowed or if a No-Go decision point is reached, you will want to make sure these are documented, followed, and a test plan is included to make sure the system is operational once again.

Classify all Integrations and Runbooks by Priority

Classify all integrations and runbooks into 3 groups based on your requirements (based on ITIL Service Management processes):

  • Required for production go live
    • You cannot go into production without these working as intended (I.E):
      • Critical integrations to monitoring and incident ticketing systems
      • Access to Critical devices
    • Execution of runbooks as needed to diagnose and connect devices as needed, (I.E Alert validation and monitoring
  • Required on day 1
    • You can go into production with these working as intended; But must be corrected within a time period, like 1 day (I.E):
      • Required integrations to monitoring and additional systems
      • Access to Important devices
      • Execution of runbooks as needed to diagnose and correct devices as needed
  • Required within a time frame
  • You can go into production with these working as intended; But must be corrected within a time period, example
  • Execution of reporting runbooks and dashboards

Verify your test plan

A test and backout plan is only as good as the documentation itself and your ability to adhere to it. Prepare, walk through and verify this with all involved as it represents a process and importance that cannot be stressed enough.

Verify Go/No-Go requirements

This should be listed in the above test plan and MOP. Please pay special attention to this requirement, and do not proceed into Production unless all the criteria items are met.

Time limit monitoring

Make sure there is enough time to complete the update and to debug issues as necessary.

Success and move forward criteria

Decide ahead of schedule what issues and the categories are acceptable for deployment to be successful, and not rolled back. For example

  • 0 Critical issues
  • 2 Major issues
  • 5 Minor issues

Test all possible scenarios in UAT

It is important to test as many runbooks and integrations in UAT as possible. Please note that some of these may require the use of mock data; but the other parts of the integrations and execution should be available for testing.

Any scenarios that are not testable during UAT should be noted, so that they can be prioritized during the Production UAT.

Another way to expedite the testing is to have a set of pre-defined runbooks capable of testing and verifying the system, then to exercise them during system UAT. An example of the script can be found in Load Testing Procedure.

An alternative approach to the above testing scenario is as follows: Have 'Validation check' runbooks that cover all integrations and validate results.

  • These should be simple, test runbooks that:
    • validate all mono-directional and bi-directional integration Test access to internal target devices for access. This may involve access through Standalone RSRemotes.
  • Have residual check runbooks that cover base functionality testing.
    • These should complete all testing as needed. **I don’t understand this bullet point**
    • For user based processes and dashboards, have a document ready to follow with known data and results to validate operations.
  • Have a baseline load testing method in place to verify stability under high load. Since all servers have multiple automations running concurrently it is important to verify this as an operation.
  • Point to Note: In UAT the number may be lower; but is still important.
    • In Production, it is important to test at a worst-case load limit.
    • There is a section below in Load Testing Procedure that covers this in more detail.

If the three items above are not available, document a paper-based procedure to accomplish the same items.

Prepare to load test UAT and PROD

This is not an optional component in Production. Actions Pro like any other software packages, frequently changes underpinning architecture. It is important to validate that the new architecture will allow you to maintain stability after go live. While performing load testing remember to monitor:

  • CPU Usage
  • I/O (Disk) Usage
  • RSMQ operation
  • Elasticsearch condition
  • External integrations
  • Any Standalone RSRemotes

(If Elasticsearch updated) Backup and Restore or Archive

When updating between Actions Pro versions, prior to 6.0 to later versions (6.1.x onwards) it’s important to remember that the version of Elasticsearch has also been updated significantly. A data migration strategy is therefore required when planning any updates of this type, to ensure system data integrity.

An important note here is that if you are on 6.0 and earlier and want to go to 7.0, you will have to do this methodical Elasticsearch update just once; but you need to validate with Customer Support on any additional steps to update that may be required and the ability to perform a straight upgrade.

Here are several alternative to assist you in mitigating the time loss.

  1. Be willing to lose all Elasticsearch data. This is normally not an alternative.
  2. Turn on archiving and archive all data to just a few days ahead of this. This will remove as much data from Elasticsearch as possible and decrease this migration time.
  3. Migrate with no execution data in Elasticsearch and migrate it later. In this scenario you would have a copy of Actions Pro running with the current Elasticsearch data for upgrade at a separate time. Upgrade the server with a ‘new’ data directory, so no data is there. Once the upgrade is complete and you are in production and successful, upgrade the saved server with the old data, and then follow the procedures to backup and restore Elasticsearch data from now the new 7.x server to the Production 7.x server. (or whatever versions is are relevant)

Load Testing Requirements

We have noticed that many customers are not testing systems under load during acceptance testing into Production. Even if there are just a few targets and automations, we advise that a load test is executed, even for a short duration. This will assist all concerned in validating that the update to Actions Pro is not faulting expected operations. We recommend:

  • At least 75% of the expected load volume
  • A minimum run time of 10 minutes

Please see Load Testing Procedure for assistance in setting this up and executing.

QA Based - Testing Requirement

The purpose here is to test as much as possible. From the “Test all possible scenarios in UAT” list, test all the integrations available, as well as access to all test and critical runbooks.

You should also be able to do a minimum level load test of critical runbooks, and all user-based wikis.

The more that is tested and validated here, the less risk is involved when migrating to production.

Production Based - Testing Requirement

Prioritize testing in PROD for the wikis, integrations and runbooks that were marked critical first, and then followed by those not tested in the UAT environment

Load testing is critical here to make sure all new Actions Pro modules are reacting as needed, that the server(s) resources are not exceeded and when in a cluster, that all servers are working to an equal level and not one server doing 80% of the work.

As mentioned in the Load Testing Requirements section above of this document, using the Linux system performance tools as well as monitoring Elasticsearch and RabbitMQ with a web browser.

Anything that is not working as expected must be agreed upon as an issue you are willing to live with, for the duration you plan is acceptable, or until corrected.

Version Differences in Detail

Actions Pro 6.3

Actions Pro 6.4

  • Improve Performance and Scalability
    • Up to 10x throughput gain through improved scalability
    • Improved automation performance by reducing calls to databases and optimize execution path
    • RSControl load distribution is completely refactored
  • Improved Stability and Up Time
    • Improved product stability by addressing archiving and database driver issues
  • Addressed a significant number of defects including Cross-Site Request Forgery (CSRF) issues and rolling up past hotfixes
  • Simplified Management and Sharing of Content
    • Enhanced Package Manager to manage additional content type
    • New look and feel for the Decision Tree
  • Perform Additional Improvements
    • The Data Collection Service (DCS) service is optimized
    • Changed the way for managing DB connections
  • RSArchive refactoring and archiving of Security Incident Response (SIR)
  • Various stability improvements:
    • Transitive dependencies removed
  • Updated a lot of 3rd party libraries versions
  • Improved backup of files during Actions Pro update process
  • Removed Atomikos library and refactored HibernateUtil
  • The Resolve logo is updated according to the new corporate branding
  • The React JavaScript library is updated to the latest version
    • Added a new Artifacts Configuration page, using the new React UI.
  • Linked icon from the Dashboard to the Content Browser
  • CSRFGuard configuration now added to the blueprint.properties file; but changes in the web.xml file in are not honored and may not be preserved
  • Update to blueprint.properties for rsconsole.esb.messagettl, rscontrol.esb.minexecutionevents, and rscontrol.esb.maxexecutionevents parameters in RabbitMQ
  • RSarchive blueprint settings
    • Please be aware that the following property settings have been changed from second to days
      • rsarchive.archive.expiry=5
      • rsarchive.archive.cleanup=5
  • Removed Atomikos library and refactored HibernateUtil for more information see
  • Rolling updates are not available for 6.4.0.9+ due to changes in RSMQ settings. This will also require updating all Standalone RSRemotes at the same time. Upting from 6.4.0.9 to later version of Actions Pro will allow for rolling updates.

Any Standalone RSRemote must have 6.4.0.9 as well as all servers must be updated to this level all at the same time. This is all or nothing.

Actions Pro 6.5

  • Primary server MUST have RSControl active
  • CSRFGuard configuration now added to the blueprint.properties file. Whitelisted Urls must be migrated to to the blueprint.properties CSRFguard section. Whitelists in the csrfguard.properties file will not be honored.
    • Update to blueprint.properties for rscontrol.sql.active=true prior to update.
  • Rolling updates are not available for 6.5.0.2 to and later version of 6.5. This will also require updating all standalone RSRemotes to match core Actions Pro Component versions at the same time

6.5 update


Primary server MUST have RSControl active
Also verify that rscontrol.sql.active=true in blueprint file
Blueprint.properties files - set new ESB execution events
rscontrol.esb.minexecutionevents=64rscontrol.esb.maxexecutionevents=64
Blueprint.properties files - set new ESB Time To Live (600,000) = 10 MIN
rsconsole.esb.messagettl=600000
10 min was the original default, and may fail in event flood
Monitor RabbitMQ port http://127.0.0.1:15672

Actions Pro 7.0

  • Version updates to major components
    • RabbitMQ 3.6.2 to 3.8.3
    • Tomcat 8.5.23 to 9.0.30
    • Elasticsearch 5.4.0 to 7.6.0
    • Oracle JDK 8 to OpenJDK 13
    • Logstash 5.4.0 to 7.6.0
    • Erlang 18.3 to 22
  • New UI theme for people with visual impairment – High Contrast Theme
  • Groovy Sandboxing mechanism
  • Additional RabbitMQ features
    • Support for RabbitMQ vhost in ESB component
    • Optional additional username/password for Rabbitmq access for remote components
  • Removed Features for 7.0
    • ExtJS features
      • Global Search Results page in ExtJS GUI
      • Advanced Search dialog in ExtJS Search Results Page

Architecture Updates to Consider

RSControl and EXECUTEQUEUE queues

Why do you suggest now to use “EXECUTEQUEUE” instead of “RSCONTROL” initially recommended by Resolve for automated processes?

The level of effort involved for this change is listed below in detail:

Change the any actiontask based ESB calls, and filter script calls:

From: ESB.sendMessage("RSCONTROL", "MAction.executeProcess", options, params)

To: ESB.sendMessage("EXECUTEQUEUE", "MAction.executeProcess", options, params)

The two queues can be explained as:

RSCCONTROL

Has no controls to limit the number of processes it can dispatch at any one time, as a result it can take on more work that it can handle. With the update 6.4 from 6.1 and the improved communication on RSControl, we are now executing faster and this is resulting in timeouts (aborts) and jobs failing to initiate.

EXECUTEQUEUE

Has controls to limit the number of processes it dispatches at any one time. This is the same queuing and throttling facility used in both 6.1 and 6.4.

As a result, it may take longer to clear the queue; but processes will wait in the queue as expected with no negative ramifications. Using this option also allows for:

  1. Spawning a new task pool for this process
  2. Starts a new timeout for the runbook, based on the new START time of this new execution thread

This may best be explained by looking at some of the internal of Actions Pro, in a block diagram fashion.

In the following diagram (Actions Pro 6.3 and prior), you can see how all requests go into any gateway. These requests then go through the ‘EXECUTEQUEUE’, and eventually to another internal queuing mechanism inside Actions Pro. All requests to ‘RSCONTROL (queue)’ were also directed into this internal mechanism. This allowed for runbook execution throttling as all runbooks called went through this intermediary queuing system.

Actions Pro 6.3 and earlier Queue execution paths

In the following diagram (Actions Pro 6.4 and later), you can see how all requests go into any gateway. These requests then go through the ‘EXECUTEQUEUE’. All requests to ‘RSCONTROL (queue)’ are acted upon directly and avoid Rabbit MQ.

Actions Pro 6.4 and later Queue execution paths

Actions Pro Components and Overhead

Included below is a suggested update to an HA architecture. While all distribution and load will vary by customer and runbooks and integrations as used and executed, one of our typical test runs revealed that setting the process primary location as well as a few parameters were able to provide us a significant performance increase.

This testing was done on Actions Pro 6.4.0.9, and also included ‘Flood’ testing where 10 times the highwater mark number of alerts were received.

Possible Server Configuration

With the typical 3 NODE cluster setup, we moved processes as far as gateways and primary functions between nodes. Validation of the CPU and Memory usage during load testing with external gateway requests as well as executing runbooks with the Load Test procedure as mentioned in Load Testing Procedure.

It should be noted that during our testing, all nodes had 6 cores CPU and 64 GBytes of RAM.

The specifics on the process execution servers and memory setting are all listed below. We would advise you to monitor your servers and adjust these as needed both during the load and flood testing as well as normal operations.

blueprint.properties File

rscontrol.esb.maxexecutionevents=40

|(module)|(Xms / Xmx)| |rsmq|256 / 512| |rscontrol.run|4096 / 4096| |rsview.run|512 / 1024| |rsremote.run|2048 / 4096| |rsmgmt.run|64 / 512| |rssearch.run|8192 / 8192|

Deployed Components

This is a 3-node cluster configured to run:

Node 1Node 2Node 3
RSMQ (Secondary)RSMQ (Primary)
RSControlRSControlRSControl
RSViewRSViewRSView
RSRemoteAll Gateways (Primary)RSRemoteRSRemoteAll Gateways (Primary)
RSMGMTRSMGMTRSMGMT
RSArchive
Logstash
DCS
RDBMS (1)RDBMS (2)
Oracle RDBMSOracle RDBMS

Putting Elasticsearch into asynchronous mode

caution

Only do this if Elasticsearch is used in an HA cluster.

Change the response on Elasticsearch to resolve to respond as quickly as possible.

Execute (at OS)

  • curl -XPUT 'http://<IPaddress>:9200/_all/_settings?preserve_existing=true' -d '{"index.translog.durability" : "async"}'

Expected response {"acknowledged":true}

Working with Elasticsearch (RSSearch)

This appendix is designed to give you the information you may need for the types of functions utilized in an upgrade. It is not designed to perform a step by step setup and backup execution.

There are several Elasticsearch commands that would be needed if you need to make a snapshot of this module. It should be known that executing the status and backup validation commands are for the full Elasticsearch cluster and do not need repeated per server.

The commands are to check for:

  • General availability
  • Elasticsearch Health
  • Validation of a Backup Directory
  • Elasticsearch Snapshot Execution
  • Elasticsearch Snapshot Validation

Elasticsearch availability

Validate the Elasticsearch responds in a web browser

Execute (in browser)

  • http://<IPaddress>:9200

Expected response

Elasticsearch Health

Validate the Elasticsearch health response in a web browser

Execute (in browser)

  • http://<IPaddress>:9200/_cluster/health?pretty=true

Expected response

  • |

Validation of a Backup Directory

Verify that a snapshot directory exists.

Execute (at OS)

  • curl -XPUT ‘<IPaddress>:9200/_snapshot/resolve_backup/resolvesnapshot_20200115?wait_for_completion=true&pretty=true’

Expected response

{
"resolve backup" : {
"type" : "fs",
"settings" : {
"compress" : "true",
"location" : "/opt/resolve/elasticsearch/backup"
}
}
}

Elasticsearch Snapshot Execution

Execute the curl statement to initiate the process. Validate the return completes as expected and the state = ‘SUCCESSFUL’ as shown in the expected return.

Execute (at OS)

  • curl -XPUT ‘<IPaddress>:9200/_snapshot/resolve_backup/resolvesnapshot_20200115?wait_for_completion=true&pretty=true’

Expected response

{
"snapshot" : {
"snapshot" : "resolvesnapshot_20200115",
"uuid" : "AJitjCK8SO-s9KO_Lp7nKg",
"version_id" : 5040199,
"version" : "5.4.1",
"indices" : [
"worksheet_20191129",
………… More removed for space requirements ………… "taskresult_20191130"
],
"state" : "SUCCESS",
"start_time" : "2020-01-17T16:09:21.697Z",
"start_time_in_millis" : 1579277361697,
"end_time" : "2020-01-17T16:09:46.913Z",
"end_time_in_millis" : 1579277386913,
"duration_in_millis" : 25216,
"failures" : [ ],
"shards" : {
"total" : 1060,
"failed" : 0,
"successful" : 1060
}
}
}

Elasticsearch Snapshot Validation

Verify that the backup exist.

Execute (at OS)

  • curl -XGET ‘http://<IPaddress>:9200/_snapshot/resolve_backup/_all?pretty'

Expected response

{
"snapshot" : {
"snapshot" : "resolvesnapshot_20200115",
"uuid" : "AJitjCK8SO-s9KO_Lp7nKg",
"version_id" : 5040199,
"version" : "5.4.1",
"indices" : [
"worksheet_20191129",
………… More removed for space requirement…………
],
"state" : "SUCCESS",
"start_time" : "2020-01-16TT19:39:37.792Z"
"start_time_in_millis" : 1579203577792,,
"end_time" : "2020-01-16T19:40:50.145Z"
"end_time_in_millis" : 1579203650145,
"duration_in_millis" :72353,
"failures" : [ ],
"shards" : {
"total" : 1040,
"failed" : 0,
"successful" : 1040
}
}
]
}

Working with RSMQ (Rabbit MQ)

The importance of monitoring RSMQ / RabbitMQ is to verify the expansion and buffering of the RSCONTROL and EXECUTEQUEUE queues. This is accomplished with a web browser and pointed to the primary RS:

http://127.0.0.1:15672

The additional items to note is the TTL setting in the blueprint.properties file. The default is 10 minutes. The ability to set this parameter was included in 6.4.0.9 and later. With this parameter setting 'messagettl=' use 600000 for 10 min as these are milliseconds. This is:

rsmq.messagettl=600000 (600,000 or 60,000 milliseconds per minute)

Like any other parameter, you will need to stop the process, update the blueprint.properties file, execute config.sh, and then restart the process.

After restarting you will get an additional message component on startup like:

Setting policy "federate-me" for pattern "." to "{\"ha-mode\":\"all\" ,\"message-ttl\":600000}" with priority "0" ...

The figure below shows how to validate the rsmq.messagettl value once the server is restarted

Once set you can load test and now look at the monitoring page, and note as you are executing the load test, the following is happening:

Overview / Totals Page

  1. Ready 1.1. Ready is the number of new requests that have yet to be processed. As you exceed the number of simultaneous executions in RSControl, this number will start to increase. 1.2. Once all load executions are started this number will max out and then decrease to 0
  2. Unacked 2.1. This is the number of current executions. 2.2. You need to make note of this prior to executing and load test, as this is the safe number to return to at the end of the test. 2.3. This number will increase to the MAX number of executions allowed for RSControl, and is a SUM of all RSControl instances, plus some additional processing. The additional processes is the number as denoted in step 2.1

Additional Validation points

  • Process Debug - validation
    • This should be OFF / False for all runbooks. Runbooks left in debug mode will have have significant impact on resources in a production environment. CPU and memory can be heavily taxed as unnecessary bloat from debug data will be processed. We have an actiontask to produce these results. Please review the following KB article for instructions.
  • ORG - definition
    • If Org or Multi-orgs is set up in Actions Pro. Identify what automations are assigned to the Org. The results are only displayed to the users/groups who have rights to those Organizations.
  • Netcool Gateway
    • SQL validator may reject queries that contain newline characters

      To correct this, update /rsremote/config/validation.properties on all RSRemotes and set Validator.HQLSQLSafeString=(?s)^.*$

      Setting to be added to blueprint.properties in a future release
  • UI
    • Ensure all users have closed all browsers and cleared the browser cache before logging in again
    • Expect custom ExtJS UIs to have issues including fields no longer visible, items misaligned, colors changed and other various rendering changes.
    • If using Ajax to make requests to external systems, the external system may need to add the OWASP_CSRFTOKEN header to its Access-Control-Allow-Headers response header that is used in response to the preflight request.
    • WSDATA values that contain the words 'error' or 'failure' retrieved using WSDATA_FLAG=true on an Ajax request result in a false error response to the request. (Fixed as of 6.4.0.12)
  • Nginx Reverse Proxy
    • If using Nginx reverse proxy, ensure that the 'underscores_in_headers' directive is used and set to 'on'. (Needed for OWASP_CSRFTOKEN)
    • Nginx Documentation
  • DR Update from 5.3.1
    • Requires ES migration before update to stand up new ES Update run with --skip-sql --no-import --no-migrate --no-restart --no-migrate and --no-restart both prevent ES migration from happening
  • Image resizing
    • In moving to react framework, images are not automatically resized the same way. Therefore all images may need to be manipulated upon upgrading past Actions Pro 6.3
  • CSRFGuard
    • In 6.4 this configuration was moved to blueprint.properties While in the docs, people seem to overlook this, and if left in the csrfguard.properties file, it will be overwritten

Load Testing Procedure

Load testing is in addition to functional testing. We have observed where changes in the underpinning infrastructure may introduce situation and / or timing that worked before; but fail to work after the upgrade.

The type and volume of transaction that comprise the load runbooks vary based on your system utilization. For instance, if your server handles about 10,000 events/day (or 7 about per minute) alerts from a monitoring system, like Netcool, a day, a proper level to would be 10 to 20 per minute. You can also monitor the number of runbooks per day and use that as the base number. Below is the calculation used to determine load.

  • 10,000 events / 24 hours / 60 minutes = 6.944 events / minute

The easiest way to load test is to utilize an integration to feed large numbers of activities into Actions Pro in a short period of time. There are a number of tools available for this like ‘JMeter’; however, they may not be practical for all situations.

What has been proved to work is to:

  • Enable the Actions Pro Webservice Listener
  • Utilize existing runbooks or create runbooks that will exercise the system as desired
  • Create an OS level script that will look for the requested number of times and submit the runbook for xecution.

Example ‘Load Execution Script’

The easiest way to create this was to lock in an ID and Password to execute with, and add a singular runbook. If you want to run several different unbooks, either add them as a second ‘curl’ command to the same script or create a different script, which will also allow you to have a different number of executions per runbook.

The script also shows examples of passing parameters into the runbook.

Starting the ‘Load Execution Script’

Execute it once and validate the results are as expected. The items you want to verify are:

  • The number of worksheets that were created as expected
  • The start times of all worksheets are as expected
  • The execution duration of the runbooks executed are all as expected

In our example the calling script is starting 10 occurrences of the targeted runbook. This was executed like:

$ /starting.sh 10

By executing in immediate mode as shown, you can validate all OK level for errors as well as the time taken to start. Since it is starting a runbook, and not waiting for results it should take a predictable amount of time. In our case, about 10 to 12 seconds

Once you are certain all the runbooks are as expected, they you can ramp up the number of occurrences and if you wish to execute more than 1 per second as the script allows, executing it twice will allow two times the submissions. In our example we are going to submit 2 runbooks per second for 2 minutes for 2 processes, or 240 runbooks called over a 2 minute duration. The example shows executing the script in the background so it will start and while still inserting into Actions Pro you can execute additional scripts, or just monitor the system. The system will notify you of the process spawned to execute the script as well as the completion of the script.

$ ./starting.sh 120 1>/dev/null 2>&1 &
[1] 20287
$ ./starting.sh 120 1>/dev/null 2>&1 &
[2] 20380
$
[1]- Done
[2]- Done
$

Backup and Recovery concerns

Actions Pro Components that have dynamic data:

(Default Actions Pro Home Directory

-> /opt/resolve/

-> bin

No Dynamic Data to backup in this directory set – Should be backed up on a regular basis (i.e. weekly)

-> dcs/

This module may not be activated unless you are using the Security Incident Request module in Actions Pro

-> data/*elasticsearch/ (aka RSSearch)

Contains RSSearch Data and needs backup and contains all non-archived execution data

logs Contains LOG data

data/*

Contains RSSearch Data

logs

Contains LOG data

backup/*

Contains snapshot data (momentary look at the ES Data when executed)

-> gatewaylibs

No Dynamic Data to backup in this directory set – Should be backed up on a regular basis (i.e. weekly)

-> Jdk

No Dynamic Data to backup in this directory set – Should be backed up on a regular basis (i.e. weekly)

-> logstash/

This module may not be activated

data/*

Contains Data and needs backup and contains all non-archived execution data

logs

Contains LOG data

-> rabbitmq/ (aka RSMQ)

log

Contains LOG data

-> rsarchive/

This module may not be activated

log

Contains LOG data

-> rsconsole/

log

Contains LOG data

-> rscontrol/

log

Contains LOG data

-> rsmgmt/

log

-> rsremote/

log

Contains LOG data

-> tomcat/

log

Contains LOG data

As it can be noted that in the Actions Pro install directories, there are only a possibility of 3 directories that contain dynamic data. They are:

  • <actions-pro-home>/elasticsearch/data
  • <actions-pro-home>/elasticsearch/backup
  • <actions-pro-home>/logstash/data

Of these two, normally only Elasticsearch is enabled. In knowing that of log files were lost that would not keep you from reloading and starting Actions Pro.

Warning that you can only backup Elasticsearch data if all server in the Elasticsearch cluster are off at the same time. To enable Elasticsearch data backup, there is a snapshot facility that can be employed, and the directory mentioned above is the suggested location and should be backed up. While it is not dynamic, it is a backup of the execution data at the time it was last requested.

Backing up the Actions Pro environment:

Actions Pro does not require the server being shut down for backup. There are a few items to remember:

  1. All operational data is maintained in the RDBMS (MySQL or Oracle) and backups should be executed based on the vendors suggestions
  2. Actions Pro’s executable directories (/opt/resolve) only need a backup after an update or change, although many customers back this directory set up on a weekly basis
  3. If the Elasticsearch data is critical, you will have to follow the backup procedure in as suggested to create and register a backup directory, and then to take a nightly, or as often as you desire, and backup that directory (/opt/resolve/elasticsearch/backup as listed above as the default) Just to repeat this point. If you have a cluster of 3 servers, each one with a copy of RSSearch running, you only need to ‘snapshot’ on one server, as they all 3 contain the same data, and a restore will restore all servers.

Restoring the Actions Pro environment:

Listed are the 3 areas to restore and general procedure for it

  1. RDBMS Data
    1. When a backup was created the RDBMS should have been put into a ‘log’ state. Once the base namespace is assigned, a standard restore will work.
    2. During this operation, Actions Pro will not be running
    3. If this is a new installation of the DB server, to may have to increase the max size of a loadable file, along with increasing the amount of connections allowed. These are documented in the Installation in Detail document for Actions Pro.
  2. Actions Pro’s executable directories (/opt/resolve)
    1. While this depends on the method of backup, a simple restore using the same utility will work.
    2. Actions Pro has no special requirements other than the ‘Resolve OS Administrative’ user to be the owner and group of the directory set
    3. If this is a clean / new OS install, and not a restore, remember to follow the installation procedure and as root, execute /opt/resolve/bin/server_setup.sh to restore all server. This are documented in the Installation in Detail document for Actions Pro, and will have to be performed on all servers in the cluster.
  3. Elasticsearch data
    1. After restoring Actions Pro, the Elasticsearch data may be restored. This can happen without stopping Actions Pro.
    2. Follow the directions in the Elasticsearch document.